Director, MLOps Engineering
Los Angeles, CA Direct-Hire $180000.00 - $225000.00 Onsite

Job Description

Director, MLOps Engineering

Employment: Direct-Hire/Full-Time

Workplace Environment: Onsite

Location: Century City, CA

Industry: Financial Services

Compensation:

SUMMARY:

Senior MLOps Engineer leads the design and maintenance of scalable, secure infrastructure for ML model deployment and lifecycle management. The Senior ML Ops Engineer role ensures models transition from development to production while meeting regulatory and compliance standards and guidelines. The ML Ops engineer collaborates closely with Data Science, Engineering, Master Data Management and other enterprise operations and business vertical teams to accelerate ML-driven insights, enhance model accuracy, govern and monitor the ML ecosystem. Beyond technical execution, the role defines MLOps strategy and architecture, addressing the "last mile" challenge of AI value realization by automating and scaling ML models as tangible business assets. Reporting the Head of Enterprise Data Management, this role serves as the key pillar that enhances efficiency, boosts model accuracy, accelerates time-to-market for new solutions, and ensures the scalability and robust governance of machine learning initiatives.

RESPONSIBILITIES:

  • ML Model Deployment & Management: Lead the design, implementation, and ongoing maintenance of scalable ML infrastructure. The infrastructure will primarily reside on leading cloud services to facilitate the seamless deployment and efficient scaling of ML models. The engineer will oversee the development of the MLOps platform and automated pipelines specifically designed for deploying, monitoring, and maintaining models within production environments. A critical aspect of this responsibility includes implementing robust solutions for model versioning, systematic retraining, and comprehensive artifact management, treating each model as a distinct artifact that requires meticulous building, testing, deployment, and ongoing management throughout its lifecycle.
  • Automation & CI/CD Pipelines: Design and implement extensive automation across the ML workflow, covering model training, rigorous testing, thorough validation, and efficient deployment. This includes setting up robust Continuous Integration/Continuous Delivery (CI/CD) pipelines for both model training and deployment, leveraging industry-standard. Automate complex data and model workflows utilizing powerful orchestration tools.
  • Monitoring, Performance & Reliability: Implement comprehensive monitoring and alerting systems. These systems are crucial for real-time tracking of model performance, assessing data quality, and ensuring overall system health. Utilize specialized tools to proactively detect critical issues like model drift, data quality anomalies, and performance degradation. A significant part of the daily work involves troubleshooting issues within production environments, including debugging model deployment failures or addressing instances of inaccurate predictions caused by mismatches in input data.
  • Data & Feature Engineering Support: Build and maintain sophisticated feature stores. Ensure precise alignment between training and inference data pipelines, thereby preventing data leakage and ensuring consistency. Collaborate with data engineers to build robust Extract, Transform, Load (ETL) pipelines that feed into data lakehouses. The engineer will also ensure dataset reliability through robust versioning practices and seamless data integration processes.
  • Security & Compliance: Integrate robust security measures directly into MLOps pipelines. Collaborate with other operations and enterprise functions to set up and monitor processes to actively mitigate a wide array of risks, including exploitation attacks, access abuse, pipeline infrastructure vulnerabilities, data integrity compromises, and model integrity attacks.
  • Collaboration & Mentorship: Engage in extensive collaboration with data scientists, data engineers, and DevOps teams to ensure the seamless integration of machine learning solutions into the firm's products and operations. Provide technical support and guidance to other team members, such as refactoring Python code from data scientists to enhance programming skills and ensure production readiness. Drive strategic initiatives, troubleshoot complex cross-domain issues, and ensure that "Trustworthy AI" guardrails are meticulously implemented.

REQUIRED EXPERIENCE & SKILLS:

  • Bachelor's or Master's degree in Computer Science, Engineering, Information Systems, or a related field.
  • 7 years of experience as a MLOps Engineer or in a similar role.
  • Expert-level proficiency in Python is essential, complemented by strong skills in Bash scripting. Familiarity with other relevant languages such as Java, Go, Ruby, or C++ is highly advantageous, particularly for performance-critical applications.
  • Extensive experience in designing and implementing cloud solutions on major platforms, including Azure or GCP is required.
  • Deep expertise with Docker and Kubernetes to manage containerizing and orchestrating complex ML workloads effectively.
  • Hands-on experience with CI/CD tools such as GitHub Actions, Jenkins, GitLab CI, or Circle CI
  • Familiarity with widely used machine learning frameworks, including TensorFlow, PyTorch, Scikit-learn, Keras, XGBoost, or AutoGluon.
  • Proficiency with SQL and practical experience with lakehouse platforms like Databricks and MongoDB are required.
  • Experience with workflow orchestration tools like Airflow or Prefect and monitoring tools such as Prometheus, Grafana, WhyLabs, or Evidently AI is required.
  • Experience with generative AI use cases and relevant engineering practices is highly desirable. Relevant cloud certifications, such as Google Cloud Professional Machine Learning Engineer, or Azure AI Engineer Associate, are preferred. Certifications in related fields like DevOps or Data Engineering further demonstrate a comprehensive understanding of the operational landscape.
  • Prior experience within the financial services, fintech, or private equity sectors is highly advantageous.
  • Experience with model serialization techniques, such as ONNX, and expertise in inference optimization are highly valued.
  • Solid understanding of feature stores, and the ability to ensure precise alignment between training and inference data are also critical.

All qualified applicants will receive consideration for employment without regard to race, color, national origin, age, ancestry, religion, sex, sexual orientation, gender identity, gender expression, marital status, disability, medical condition, genetic information, pregnancy, or military or veteran status. We consider all qualified applicants, including those with criminal histories, in a manner consistent with state and local laws, including the California Fair Chance Act, City of Los Angeles' Fair Chance Initiative for Hiring Ordinance, and Los Angeles County Fair Chance Ordinance.

Job Reference: JN -022026-415693